Note: We use the snakemake workflow engine1 to maintain reproducibility in
technical validation, regeneration of results, and improvement of the
microbiome bioinformatics analysis.
Improved Mothur Snakemake Workflows
imap-mothur
A workflow for microbial profiling using 16S rRNA gene markers.
We review existing workflows2 and
create a more reproducible pipeline.
Each major step form a separate snakemake rule.
Workflow framework
Microbial profiling classification options
1. Operational Taxonomic Units (OTUs)
OTUs are clusters of similar sequences and are commonly accepted as
analytical units in microbial profiling when using 16S rRNA gene
markers.
2. Phylotype
A phylotype in microbiome research is a DNA sequence or group of
sequences sharing more than an arbitrarily chosen level of similarity of
a 16S rRNA gene marker.
3. Amplicon Sequence Variant (ASV)
An ASV in microbiome research is any inferred single DNA sequences
recovered from a bioinformatics analysis of 16S rRNA marker genes.
4. Microbial Phylogenies
Microbial phylogenies are from gene sequence homologies. Models of
mutation determine the most-likely evolutionary histories.
Mothur-based RDP reference files4. Note: The
RDP database is to classify 16S rRNA gene sequences to the genus level.
ZymoBIOMICS Microbial Community Standard (Cat # D6306)5. The ZymoBIOMICS Microbial
Community DNA Standard is designed to assess bias, errors and other
artifacts after the step of nucleic acid purification.
Sample location (demo)
Troubleshooting (in progress)
Mothur dist.seqs taking too long.
Merged reads are too long, probably over 300pb.
Reads not overlaping when merging the paired reads.
Too many uniques representative sequences probably caused by lack of
overlapping.
No enough computer power which suggest a use of HPC or Cluster.
References
1. Köster, J., Mölder, F., Jablonski, K. P., Letcher, B., Hall, M. B.,
Tomkins-Tinch, C. H., Sochat, V., Forster, J., Lee, S., Twardziok, S.
O., Kanitz, A., Wilm, A., Holtgrewe, M., Rahmann, S., & Nahnsen, S.
(2021). Sustainable data analysis with snakemake.
F1000Research, 10. https://doi.org/10.12688/f1000research.29032.2